Sort process

ABSTRACT

A DISTRIBUTION SHORT PROCESS IS PROVIDED WHICH RESULTS IN THE DISTRIBUTION OF THE RECORDS OF A FILE INTO A PLURALITY OF BUCKETS SUCH THAT, THE DISTRIBUTED RECORDS CAN BE RECOVERED IN SEQUENTIAL ORDER OF KEY VALUE IN A ONE SORT PASS. IN THE PROCESS, THE TAGS OF THE RECORDS ARE FIRST SORTED INTO SEQUENTIAL ORDER BY KEY VALUE (A TAG INCLUDING A RECORD&#39;&#39;S KEY AND ITS ADDRESS IN THE FILE). THE ADDRESS PORTIONS OF THE TAGS AS THEY ARE ARRANGED IN THE TAG SORT ARE THEN SORTED INTO A SET OF NUMBERED SUBSTRINGS BY A MODIFICATION OF A CONVENTIONAL INTERNAL SORT METHOD SUCH AS REPLACEMENT SELECTION. THE SUBSTRINGS THEN ARE MERGED INTO A FINAL STRING. IN THE MERGE, THE STRING NUMBER IS ADDED TO THE TAG, AND THE KEY AND ADDRESS PORTION MAY BE DELETED TO LEAVE A LIST OF STRING NUMBERS WHICH LIE IN THE SAME ORDER AS THE RECORDS IN THE FILE. A DISTRIBUTION SORT IS THEN PERFORMED ON THE RECORDS BY DISTRIBUTING THEM INTO A QUANTITY OF BUCKETS EQUAL TO THE QUANTITY OF SUBSTRINGS PRODUCED IN THE INTERNAL SORT OF THE ADDRESSES, THE BUCKETS BEING NUMBERED TO CORRESPOND TO THE NUMBERING OF THE SUBSTRINGS. THE DISTRIBUTION SORT IS DESIGNED SUCH THAT THE RECORDS DISTRIBUTED TO A GIVEN BUCKET ARE THOSE WHOSE ADDRESSES ARE IN THE SUBSTRING WHICH HAS THE SAME NUMBER AS THE BUCKET. THE BUCKETS ARE ARRANGED IN SEQUENTIAL ORDER OF NUMERICAL VALVE. AT THE COMPLETION OF THE DISTRIBUTION SORT, A SINGLE SORT PASS OF THE DISTRIBUTED RECORDS PRODUCES A STRING OF RECORDS ARRANGED N SEQUENTIAL ORDER OF KEY VALVE. THE DISTRIBUTION PHASE CAN EITHER BE SINGLEOR MULTI-PASS. THE INVENTION CONTEMPLATES AN ARRANGEMENT WHERE THE BUCKETS CAN BE VISITED CYCLICALLY FOR DISTRIBUTIONS OF RECORDS THEREINTO OR THEY CAN BE SELECTED FOR VISITATION IN ACCORDANCE WITH CHOSED CRITERIA. THE INVENTION ENABLES THE USE OF BUCKETS WHOSE SIZE IS ON AN AVERAGE TWICE AS LARGE AS THE MAIN STORE AND MAKES POSSIBLE ADVANTAGEOUS MINIMIZATION OF SEEK AND LATENCY TIMES.

EFENSWE PUELTGATiGN UNITED STATES PATENT OFFICE Published at the request of the applicant or owner in accordance with the Notice of Dec. 16, 1969, 869 O.G. 687. The abstracts of Defensive Publication applications are identified by distinctly numbered series and are arranged chronologically. The heading of each abstract indicates the number of pages of specification, including claims and sheets of drawings contained in the application as originally filed. The files of these applications are available to the public for inspection and reproduction may be purchased for 30 cents a sheet.

Defensive Publication applications have not been examined as to the merits of alleged invention. The Patent Oifice makes no assertion as to the novelty of the disclosed subject matter.

PUBLISHED APRIL 23, 1974 T921,028 SORT PROCESS Brian T. Bennett, Mohegan Lake, and Archie C. McKellar, Mount Kisco, N.Y., assignors to International Business Machines Corporation, Armonk, N.Y.

Continuation of application Ser. No. 208,546, Dec. 16, 1971. This application Sept. 17, 1973, Ser. No. 398,620 Int. Cl. G061? 9/12 U.S. Cl. 444-1 22 Sheets Drawing. 68 Pages Specification WEE] BlHtLl-D'I NO YES A distribution sort process is provided which results in the distribution of the records of a file into a plurality of buckets such that, the distributed records can be recovered in sequential order of key value in a one sort pass. In the process, the tags of the records are first sorted into sequential order by key value (a tag including a rccords key and its address in the file). The address portions of the tags as they are arranged in the tag sort are then sorted into a set of numbered substrings by a modification of a conventional internal sort method such as replacement selection. The substrings then are merged into a final string. In the merge, the string number is added to the tag, and the key and address portion may be deleted to leave a list of string numbers which lie in the same order as the records in the file. A distribution sort is then performed on the records by distributing them into a quantity of buckets equal to the quantity of substrings produced in the internal sort of the addresses, the buckets being numbered to correspond to the numbering of the substrings. The distribution sort is designed such that the records distributed to a given bucket are those whose addresses are in the substring which has the same number as the bucket. The buckets are arranged in sequential order of numerical value. At the completion of the distribution sort, a single sort pass of the distributed records produces a string of records arranged in sequential order of key value. The distribution phase can either be singleor multi-pass. The invention contemplates an arrangement where the buckets can be visited cyclically for distributions of records thereinto or they can be selected for visitation in accordance with chosen criteria. The invention enables the use of buckets whose size is on an average twice as large as the main store and makes possible advantageous minimization of seek and latency times.

April 23, 1974 BENNETT ETAL TQZLOZB somrnocass I Original Filed Dec. 16, 1971 22 Sheets-Sheet 1 89101112 KEY 121106Y83411259AD I F 3 LOAD MAIN STORE A REA WITH G ITEMS FROM THE INPUT SEQUENCE. N0 ITEMS ARE MARKED.

ARE ALL ITEMs MARKED? 12 NO YES 14 22 TEST FOR END OF END STRING. UNMARK MARKED INPUT SEQUENCE? ITEMS. START MExT STRING.

NO YES 16 COMPARE ITEM FROM INPUT SEQUENCE WITH THE SMALLEST HS MX'NNMASRTIEE ITTOEMS \25 UNMARKED ITEM IN MAIN sToRE. T T 0R R Is THE INPUT ITEM LARGER? CURRE" 5 DE NO YES SORT MARKED ITEMs IN THE 1 MAIN STORE AND OUTPUT -24 ITEM As FINAL STRING 20 2 END APPEND THE SMALLEST UNMARKED ITEM IN MAIII STORE T0 CURRENT 18/ OUTPUT STRING; REPLACE INVENTORS IT IN MAIN STORE BY BRIAN BENNETT THE INPUT ITEM. ARCHIE C. McKELLAR BY with.

ATTORNEY April 23, 1974 BENNETT EI'AL T921,0Z8

SORT PROCESS Original Filed Dec. 16, 1971 2,2 Sheets-Sheet B 26/ INPUT THE FIRST ITEM FROM EACH STRING TO BE MERGED SELECT THE SMALLEST ITEM. OUTPUT IT AND 28 REPLACE IT IF POSSIBLE WITH THE NEXT OCCURRING ITEM ON THE SAME STRING.

30 ANY ITEMS LEFT TO BE MERGED? NO YES END FIG, 5A 9 5 4 a 2 (smmcn FIG, 5B 11 a 1 e 1 (STRINGZI FIG, 5C 12 10 1511111105) F|G,6A12s45s1as1o1112 FIG 6B 211112221325 F=210T81I45612591 April 23, 1974 Original Filed Dec. 16, 1971 E. T. BENNETT ETAL SORT PROCESS 22 Sheets-Sheet 5 42* IND =1 NO YES 4 H J NEXT BUCKET NUMBER l [YES I F, REC(I) NEXT RECORD, F PTR (F) YES I=BTM(L) OUTPUT REC(I) T0 BUCKEHL) BTM(L)=TOP(L)? NO YES April 23, 1974 Original Filed Doc. 16,

F1G-8B FIG. 10A

FIG. 108

FG.1OC

REC

PTR

REC

PTR

REC

PTR

B. T. BENNETT L SORT I'ROCESS 22 Sheets-Sheet 4 12 3=e FIG. 9A

1 2 3.. FIG. 9B

1 2 1 2 3 0 0 2 BTM 1 3 0.

1 2 3.. FIG. 90

0 1 2' BTM 0.3 0

FIG. 11A

I 3 TOP 1 5 0 FIG-.128 FIG.11B 1 2 3 I 2 TOP 2 3 0 FIG. no

April 23, 1974 B. T. BENNETT ET AL SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet 13 F1G.14A FIG. 15A I I Li 3 2 1 FIG. 138 FIG. 145 FIG. 15B

. I I 1 a 11 3 2 1 v FI. 13C FIG. 'I4C FIG. 15C '51 I 24569 10181112 FIG.'I6A F 2 FIG.16B F 11111111121110 ANYYMORE RECORDS 1 82 FIG. 17 GET NEXT RECORD 84 I IS 11111 STORE FULL? 86 7 YES N0 6010 NEXT BUCIIET IF NECESSARY. -88

IF 11o RECORDS,END.

OUTPUT ALL RECORDS 90 FOR 01111115111 BUCKET l April 23, 1914 B. T. BENNETT ET AL SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet) v Apr 23, 1974 ET ETAL TQZLOZS SORT PROCESS Original Filed Dec. 16, 1971 2.2 Sheets-Sheet 9 3 3 3 3 FIG. 21A

REG 0 0 1 2 3 =5 PTR 0 1 2 BTM 0 0 0 3 2 3 3 FIG. 21B

REC 7 10 2 1 2 3 =3 PTR 0 1 2 BTM 2 5 0 3 3 3 33 FIG. 21c

REC 2 1 2 =8 C PTR 2 o 2 5mm FIG.22A FIG. 23A FIG. 24A FIG. 25A

123 =M AR211 c000 6M5 F5 AR112 c210 .CM3 F0 F|G.23C FIG. 25C

April 23, 1974 Original Filed Dec. 16, 1971 B. T. BENNETT ET AL FIG. 28A

26A FiG. 27A

. 1 2 3 s 1 2 3 s 1 2 3 s TOP 1 3 o 011 2 1 0 1 o o 1 FIG. 278 FIG. 28B

1 2 s s 1 2 3 s EEII CT 1 CT 2 INITIALIZATION 185 FIG. 31

GET NEXT BUCKET NUMBER. GET NEXT RECORD. F191 1 19 MAIN STORE FULL 1 1'99 191 NO YES 191 1 j ANY MORE RECORDS TO BE READ 1 ANY RECORDS FOR CURRENT BUCKET 1 I YES 10 YES NO A FIG. 19

01111111 ALL RECORDS ANY RECORDS 111 111111 STORE 1 FOR CURRENT BUCKET YES 199 110 196 CHOOSE NEW ANY MORE RECORDS 10 BE BUCKET END I READ INTO MAIN STORE YES N0 April 23, 1974 BENNETT EK'AL TQZLOZS SORT PROCESS Original Filed DEC. 16, 1971 22 Sheets-Sheet ll ii -1 -2o2 FIG. 32

FIG.

32A [UPPRmLWR] 205 FIG. L 328 k-k km-(UPPR-LWR) #204 BND(0) LWR BND(I) k*I I=1,--,mj -209 BND(m-j+I) (m-j)*k+I K(k-1) I=1,---,j

ppm? -2os NO [YES 21o 1' 2o1 PTR(I) I1 I=1TOG PUT BND(I),BND(I1)INTO BTM(J) 0 J=1T0m BUCKET,TRACE(ii-1)*m+I F G,L 1,IND O I FOR PASS pp+1,I=1,-",m

mom -212 YES NO Nb-NEXT BUCKET NUMBER mom TRACE ii FOR PASS pp. CHOOSEJ 214 SUCH THAT snow-1) NJ 5 BNDU) 215 PP P? l NO YES PUT NJ=NEXT TRACE RECORD IN THE TRACE(ii-1)*m+J FOR PASS pp+1 216 EOF? YES l& l

IND 1 P4 REc(I) NEXT RECORD km J F -PTR(F) April23,1974 B TT ET'AL -'1'921,02s

SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet 15 HQ 33 TRACE FOR PASST (BUCKET NUMBERS 1, ..,16 111 ADDRESS-ORDER) 1 SUPERBUCKETS FOR PASS 2 (PRODUCED BY PASS 1) NUMBERS 12-16 NUMBERS 1-11 NUMBERS 1-6 RECORDS WlTH BUCKET RECORDS WITH BUCKET l RECORDS WITH BUCKET SUBTRACES FOR PASS 2 (PRODUCED BY PASS 1) (BUCKET NUMBERS (BUCKET NUMBERS (BUCKET NUMBERS 12-16111 1-11 111 1-6 111 BNDS BNDS BNDS ADDRESS 011111111 ADDRESS 011111511) ADDRESS 011111111 36 1 SUPERBUCKETS FOR PASS 3 (PRODUCED BY PASS 2 FOR SUPERBUCKET 3) RECORDS w1111 BUCKET RECORDS 111111 BUCKET RECORDS 111111 BUCKET 11111111511 16 NUMBERS 14 115 NUMBERS 12113 SUBTRACES FOR PASS 3 (PRODUCED BY PASS 2 FOR SUBTRACE 3) 3 2 1 16,15 (BUCKET NUMBER 15,13 (BUCKET NUMBERS 13,11 (BUCKET NUMBERS 1 S 16 111 ENDS 14115111 BNDS 12113 111 ADDRESS ORDER) ADDRESS ORDER) ADDRESS 011112111 g 38 SUPERBUCKETS FOR PASS 3 FE (PRODUCED BY PASS 2 FOR SUPERBUCKET 21 RECORDS 111111 BUCKET RECORDS 1111111 BUCKET RECORDS 11111 BUCKET 11111111111 11 NUMBERS 1616 11u1111111s 1111 Apri-F 23, 1974 a N T ETAL T921,028

SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet 1a.

FaG SUBTRACES FOR PASS 3 1 PRODUCED BY PASS 2 FOR SUBTRACE 1 3 2 1 1 1 (BUCKET NUMBER (BUCKET NUMBERS (BUCKET NUMBERS 1 11 111 11 R 10 111 816 1 Rs 111 ADDRESS ORDER) ADDRESS ORDER) ADDRESS ORDER) SUPERBUCKETS FOR PASS 3 (PRODUCED BY PASS 2 FOR SUPERBUCKET 1) 3 2 1 RECORDS 1111111 BUCKET RECORDS 1111111 BUCKET RECORDS 111111 BUCKET W NUMBERS 5 a e NUMBERS 3 & 4 NUMBERS 1 & 2

SUBTRACES FOR PASS 3 FIG'41 (PRODUCED BY PASSZFOR SUBTRACE1) 3 J 2 1 (Bucm NUMBERS (BUCKET NUMBERS (111101151 NUMBERS 614 5 R e 111 3 R 4 111 1 1 2 111 BNDS 3110s BNDS ADDRESS ORDER) ADDRESS ORDER) ADDRESS ORDER) BUCKJETS RESULT I'NG} FROM PASS 3 l ll ll ll ll ll Il ll l LlEJ k 15 14 J k 13 12 J 11 1o 9 (PASS 31-9 (PASS 31-11 (PASS 31-1 (PASS 3)6 (PASS 31-5 I II II H IIWMII IIW k s 7 J k 5 5 J k 4 3 J L 2 1 J (PASS 5)-4 (PASS 3)-3 (PASS 5)-2 (PASS 3)-1 ApriI23,1974 B T BENNETT ETAL T921928 soar PROCESS Original Filed Dec. 16, 1971 f 22 SheetsSheet 15 INITIATE READ OF FIRST BLOCK OF RECORDS INTO BUFFERI -2T2 INITIATE READ OF NEXT BLOCK OF RECORDS INTO BUFFER 2 -2T4 TEST. IS THE READ INTO BUFFER I COMPLETE? -2T6 IF NOT, WAIT UNTIL COMPLETE.

FIG.44 I

FIG. 45

INITIATE READ OF NEXT BLOCK OF RECORDS INTO BUFFER W, 286

W 5 -W, t -I.

TEST. IS READ INTO BUFFER W COMPLETE? 288 IF NOT, WAIT UNTIL COMPLETE.

A ril 23, 1974 B. T. BENNETT ET AL SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet 16 April 23, 1974 Original Filed Dec. 16, 1971 22 Sheets-Sheet 17 NB(L)B OUTPUT PARTl-AL BLOCK FROM BUFFER NO E T0 BUCKET L f- I I FIG. 41 I l J A 550 NB(L)7-B? READ PREVIOUS PARTIAL NO YES BLOCK m BUCKETL mm BUFFER 4 l l 551 552 H-o 354 RD NB(L)? 538 NO YES r NB(L)*B NB(L) NB(L)-RD m Y H H+i 1- BTM(L) \540 OUTPUT REC(I) T0 BUCKET L BTM(L)=TOP(L) R 5 .2 A A0] YES BTM(L) -PTR(I) N BTM(L)-0 L PTR(I) F, F -I PTR(I) F, F I

PR0? $550 548 NO YES NO YES April 23, 1914 Original Filed Dec. 16, 1971 B. T. BENNETT ETA'L 508T PROCESS 22 Sheets-Sheet 1s Apri123, 1914 NN TT HAL T921,028

SORT PROCESS Original Filed Dec. 16, 1971 22 Sheets-Sheet 1 1 2 3=S REC NB 3 3 3 I PTR 0 1 2 F16. 48B 1 2 3=G 1 2 3 =8 REC 7102 NB 3 3 3 PTR 0 1 2 REC 2 FIG. 490 W 2 0 2 FIG. 50A FIG. 51A

BT11 0 0 0 I AR 2 1 1 FIG. 508 FIG. 51B

BTM 2 3 0 I AR 1 1 2 FIG.'5OC 

